Fix system df to count content blobs and deduplicate shared storage#1555
Fix system df to count content blobs and deduplicate shared storage#1555realrajaryan wants to merge 1 commit into
Conversation
1dca65d to
c00c1cf
Compare
Code Coverage
|
jglogan
left a comment
There was a problem hiding this comment.
Think everything looks functionally fine but we shouldn't be accessing the content store directly. The current approach would be broken for any other content store implementation.
| import Foundation | ||
|
|
||
| extension FileManager { | ||
| /// Total bytes allocated on disk for all files in a directory (recursive). |
There was a problem hiding this comment.
The comments should include more details on what is and isn't counted. As far as I can tell:
- Symlinks to files outside
directorywill include those files' sizes in the total. - Symlinks to files inside
directorywill include those files' sizes in the total, resulting in inaccuracy due to multiple counting. - The same issues apply for hard links.
- Symlinks to directories aren't followed (which is good).
Do we need to address the double counting issues, or will the directories we enumerate never have these sorts of things?
| /// Total bytes allocated on disk for all files in a directory (recursive). | ||
| public func allocatedSize(of directory: URL) -> UInt64 { | ||
| guard | ||
| let enumerator = self.enumerator( |
There was a problem hiding this comment.
I added FileDescriptorOps.enumerate() to ContainerizationOS for TOCTOU-safe traversal of directory trees; it might make sense to use here.
| let unpackStrategy = SnapshotStore.defaultUnpackStrategy(initImage: containerSystemConfig.vminit.image) | ||
| let snapshotStore = try SnapshotStore(path: root, unpackStrategy: unpackStrategy, log: log) | ||
| let service = try ImagesService(contentStore: contentStore, imageStore: imageStore, snapshotStore: snapshotStore, log: log) | ||
| let service = try ImagesService(contentStore: contentStore, contentBlobsPath: contentBlobsPath, imageStore: imageStore, snapshotStore: snapshotStore, log: log) |
There was a problem hiding this comment.
@adityaramani should the ImagesService know about the contentBlobsPath? That feels like an encapsulation break.
It's only used for getting the total size of the content store - could we just add a content client call that returns that value?
There was a problem hiding this comment.
Yes +1!
Between container and containerization there are two types are implement the ContentStore protocol
- The
LocalContentStorefrom Containerization - The
RemoteContentStoreClientfrom this repo
They exist because there should be only one entity that is actually touching the files on disk - if we had multiple instances of LocalContentStore going about they could trample over each other and mess with the state of the content store.
The RemoteContentStoreClient type talk to the ContentService (also defined in this file) which houses the one singular instance of LocalContentStore
When you ask the RemoteContentStoreClient to ingest something into the content store it proxies that request through to the LocalContentStore which ensures that all operations happen atomically.
With this change - it slightly breaks that abstraction and makes ImageService directly check a path on disk where the files the LocalContentStore store may be.
To solve the "get me the total size of the content store" problem - we can probably do two things.
- Add that API to the
ContentStoreprotocol and implement it in both the types - Add a list api to the
ContentStorewhich returns a list ofContenthttps://github.com/apple/containerization/blob/main/Sources/ContainerizationOCI/Content/Content.swift#L23 objects (this is probably a little more involved and needs some deliberation)
|
|
||
| extension FileManager { | ||
| /// Total bytes allocated on disk for all files in a directory (recursive). | ||
| public func allocatedSize(of directory: URL) -> UInt64 { |
There was a problem hiding this comment.
We shouldn't write any more code that uses URL for fs paths.
There was a problem hiding this comment.
Also noting that this would be a good candidate for extracting to a FilePathOps utility type in ContainerizationOS: apple/containerization#744.
container system dfomits image content blobs from image size and reclaimable bytes #1526 and [Bug]:container system dfdouble-counts image storage when the same image has multiple tags #1527.Type of Change
Motivation and Context
This PR fixes
system dfto report actual on-disk allocated bytes (content blobs + snapshots) instead of summing per-image snapshot sizes. Orphaned blobs are now included as reclaimable, and storage shared across tags is no longer double counted. Also consolidates three identicalcalculateDirectorySizeimplementations into a sharedFileManager.allocatedSize(of:)extension.Testing